library(tidyverse)
library(ggplot2)
battles <- read.csv("source_data/battles.csv") 
battles_kings <- battles %>% drop_na(defender_king)

There are in total 38 battles in the War of the Five Kings, while 35 those whose defender/attacter were both for kings. Lets have a glance of the proportion that each king enrolled into those battles.

attackers <- battles_kings %>%
        group_by(attacker_king) %>%
        summarise( n = n()) %>%
        rename(King = attacker_king) %>%
        rename(n_attact = n)
defenders <- battles_kings %>% 
        group_by(defender_king) %>% 
        summarise( n = n()) %>%
        rename( King = defender_king) %>%
        rename(n_defend = n)
total <- full_join(attackers, defenders, type = "right", by = "King") %>%
        mutate(n_attact = replace(n_attact,is.na(n_attact),0)) %>%
        mutate(n_total = n_attact + n_defend) %>%
        mutate(perc= n_total/sum(n_total))  %>% 
        arrange(perc) %>%
        mutate(labels = scales::percent(perc))
ggplot(data = total, aes(x="", y = n_total, fill = King)) +
        geom_bar(stat = "identity", width=1) +
        coord_polar("y", start=0) +
        theme_void() + geom_text(aes(label = labels),
                                 position = position_stack(vjust = 0.5))

We can see, Jofferey/Tommaen Baratheon enrolled most of the battles. That makes sense since the Seven Kingdom should be leagally under the reign of house Baratheon. They needed to keep their reign. Robb Stark is second only to Jofferey/Tommaen Baratheon in the number of wars. It’s also reasonable since Jofferey kills Eddard Stark, Robb’s father. Robb must want to revenge.

attacker_win <- battles_kings %>% filter(attacker_outcome == "win") %>%
        group_by(attacker_king) %>% summarise( n = n()) %>% 
        rename(King = attacker_king, nwin_attack = n)
defender_win <- battles_kings %>% filter(attacker_outcome == "win") %>%
        group_by(defender_king) %>% summarise( n = n()) %>% 
        rename(King = defender_king, nwin_defend = n)
king_win <- full_join(attacker_win, defender_win, type = "right", by = "King") %>%
          full_join(total, by = "King") %>%
         mutate(nwin_attack = replace(nwin_attack,is.na(nwin_attack),0)) %>%
         mutate(nwin = nwin_attack + nwin_defend ) %>%
         mutate(perc= nwin/sum(n_total))  %>% 
        arrange(perc) %>%
        mutate(labels = scales::percent(perc))

ggplot(data = king_win, aes(x ="" , y= perc, fill = King)) + 
        geom_bar(, stat = "identity",width=1) + 
        coord_polar("y", start=0) +
        theme_void() + geom_text(aes(label = labels),
                                 position = position_stack(vjust = 0.5))
## Warning: Removed 1 rows containing missing values (`position_stack()`).
## Removed 1 rows containing missing values (`position_stack()`).

ggplot(data = king_win, aes(x = nwin , y= King, fill = King)) + 
        geom_bar(, stat = "identity", show.legend = FALSE) + xlab ("Counts of win")
## Warning: Removed 1 rows containing missing values (`position_stack()`).

Joffery/Tommen Baratheon wins the most. It’s not difficult to guess the result since the true power holder is Tywin Lannister, who is rich and wily. Robb Stark is a recognized war commander. He would win.

Next, I would like to explore the relationship between size and win.

battle_results <- battles %>% 
        mutate(size_diff = attacker_size - defender_size, 
               outcome_num = ifelse(attacker_outcome == "win", 1,0))


ggplot(data = battle_results, aes(x = attacker_size, y = defender_size, color = attacker_outcome))+
        geom_point()
## Warning: Removed 22 rows containing missing values (`geom_point()`).

We can see sometimes attacker could loss even though they prevailed in absolute numbers.

Let’s further do a logistic regresion

# Logistic regression for battle outcome
logit_fit <- glm(outcome_num ~ size_diff,data = battle_results)
summary(logit_fit)
## 
## Call:
## glm(formula = outcome_num ~ size_diff, data = battle_results)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.77020  -0.07857   0.22067   0.25557   0.35007  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.444e-01  1.139e-01   6.535 1.32e-05 ***
## size_diff   -8.591e-06  4.442e-06  -1.934   0.0736 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1937582)
## 
##     Null deviance: 3.4375  on 15  degrees of freedom
## Residual deviance: 2.7126  on 14  degrees of freedom
##   (22 observations deleted due to missingness)
## AIC: 23.011
## 
## Number of Fisher Scoring iterations: 2

The size different between attacker and defender is not significant, but interestingly negative. In other word, higher size different between attacker and defender results in higher odds to lose. Probably size different is not the only effect for winning or losing.